Methodology and challenges for harmonization of nutritional data from seven historical studies

Background Collection of detailed dietary data is labor intensive and expensive, harmonization of existing data sets has been proposed as an effective tool for research questions in which individual studies are underpowered. Methods In this paper, we describe the methodology used to retrospectively harmonize nutritional data from multiple sources, based on the individual participant data of all available studies, which collected nutritional data in Israel between 1963 and 2014. This collaboration was established in order to study the association of red and processed meat with colorectal cancer. Two types of nutritional questionnaires, the Food Frequency Questionnaires (FFQ) and the 24-h dietary recall (24HR recall), and different food composition tables, were used by the participating studies. The main exposure of interest included type of meat (total meat, red meat, and poultry) and level of processing. Results A total of 29,560 Israeli men and women were enrolled. In studies using FFQ,the weighted mean intakes of total, red, processed meat, and poultry were 95, 27, 37 and 58 gr/day and 92, 25, 10, and 66 gr/day in studies using 24HR recall, respectively.. Despite several methodological challenges, we successfully harmonized nutritional data from the different studies. Conclusions This paper emphasizes the significance and feasibility of harmonization of previously collected nutritional data, offering an opportunity to examine associations between a range of dietary exposures and the outcome of interest, while minimizing costs and time in epidemiological studies. Supplementary Information The online version contains supplementary material available at 10.1186/s12937-024-00976-8.


Introduction
The pooling and harmonization of existing data sets has been proposed as an effective and important tool for research questions in which individual studies are underpowered and for the study of rare outcomes [1].
The dietary exposure is complex and in recent years nutritional studies shifted their focus from single nutrients to foods, dietary patterns and attributes of the diet such as level of processing [2,3].Unlike nutrients, that are a universal element of the diet, the types of foods people eat and food patterns vary greatly, and require large sample sizes and diverse populations to represent this heterogeneity [4,5].A recent review highlighted the importance of harmonization in nutritional studies, as many studies use relatively small populations with limited power and generalizability [6].
These methodological challenges are even more pronounced with research questions dealing with the etiology of chronic diseases with extended latency periods, requiring long follow-up [7].
We established a collaboration, calling out to all potential partners with historical nutritional data bases collected in the country over the past decades in order to study the association between nutrition and colorectal cancer.
The aim of this paper is to describe the methodology used to harmonize nutritional data from multiple sources with diverse nutritional questionnaires and nutrient databases.A secondary aim is to describe characteristics of the participating studies and dietary intake of participants.

Methods
This is a collaborative historical cohort study, based on the individual participant data of 7 studies (N = 29,560), which collected nutritional data between 1963 and 2014.
To be included, studies had to be conducted in Israel, collected a single detailed dietary intake data, and have identified records to enable linkage with the Israeli Cancer Registry (INCR) and the central population registry system for vital status using the unique identifying number each Israeli citizen holds.Principal investigators were contacted to confirm the eligibility of their studies and their willingness to share their data.After obtaining an Institutional Review Board (IRB) approval, each principal investigator transferred a study file to the study center with a set of variables based on a common dataset identified by the study team.Participating studies were originally of cohort, cross-sectional or case-control design; for case-control studies, we only included data of the control subjects free of cancer, the outcome of interest, by the time of the nutritional interview.
In addition to dietary data, all studies collected information on sociodemographic, lifestyle and health characteristics of the participants.Harmonization of these variables was performed as well, yielding to comparable datasets.
Twelve individuals were found to take part in more than one of the participating studies.Their information from the study with more complete data was included in the current collaborative analysis.

Study data
A unified coding system for non-dietary variables, which may serve as potential confounding factors, was developed mapping the available variables among studies and data coding was standardized accordingly.These variables included: date of nutritional interview, age, sex, ethnicity, country of birth, education, marital status, cigarette smoking habits, body weight, height, or calculated Body Mass Index (BMI), and physical activity.Information on lifestyle (physical activity and smoking) was available for 6 of the 7 studies.

Nutritional data harmonization
There were several differences regarding nutritional data assessment between the studies included in the current analysis: type of nutritional questionnaire, the foods and nutrient databases, and periods of data collection.

Nutritional questionnaires and food composition databases
The studies included in this collaboration used three types of nutritional questionnaires and five nutritional databases for calculation of nutrient intake.The types of questionnaires used were a semi-quantitative Food Frequency Questionnaires (sq-FFQ), a quantitative Food Frequency Questionnaire (q-FFQ) and a 24HR recall.

Food and nutrient composition databases
Differences in food composition databases included variations in nutrient composition and in portion sizes defined.Some of these differences reflect characteristics of the food composition and food supply in the period in which the original study was conducted.In order to represent these characteristics accurately, nutrient composition was calculated (or received pre-calculated from the PI) for each study using its original database.Harmonization of the data was performed on the food and food group level for the specific purpose of studying meat intake.A nutritional epidemiologist reviewed food level dietary data using the original data dictionaries and descriptive statistics, portion sizes were translated into grams and a common categorization system for foods was created.The system was used to group single foods into 22 common food groups with an emphasis on food groups of interest to the project (red meat, processed meat and poultry).Meat was sub-categorized by level of processing (unprocessed, processed and ultra-processed) (see Table 4), Organ meats represented a separate group.Red meat included beef, veal, pork, lamb, mutton, and goat, in accordance with the IARC definition [1].Processed meat was divided into two sub-groups within each meat type: processed, which included items such as hamburger and breaded and fried chicken breast, and ultra-processed, which included items such as sausages, hotdogs, pastrami and chicken nuggets, in line with the NOVA classification [7].In addition, to avoid overestimation of meat intake [20], composite dishes that include meat (i.e.meat-stuffed vegetables etc.) were separated into sub-groups by meat type.The meat content was then calculated according to its relative share of the dish (usually 30% of weight).The meat content of composite meat dishes was included in the unprocessed meat sub group.
First, reported food consumption was converted into average daily amounts consumed based on frequencies, number of portions and portion sizes.For the FFQ, seasonal items were adapted to the length of the Israeli season in which the item is mostly available.Energy intake and several macro and micronutrients intakes were calculated for each food item using information on food composition from international and local sources, multiplying the quantity consumed of each food item by the values of energy intake and micro and macronutrients in 100 g of that food item.Secondly, the foods were grouped to 8 subgroups of meat and fish items and 14 groups of other food items (fruits, vegetables, bread and cereals, milk and milk products, eggs, legumes, nuts and seeds, sweets, sugar sweetened beverages, artificially sweetened sweets and beverages, alcoholic beverages, ethnic dishes, spreads, sauces and spices).Working files were built for each study including for each food group selected macro and micronutrients intakes and their densities, namely: energy intake, carbohydrates, protein, total and saturated fat, fibers, cholesterol, alcohol, calcium iron and folic acid (22 food groups times (10 nutrients + 10 densities + energy intake)).Building the database in this way allowed exploring of the nutritional exposure both in terms of intake of a food group or by percent energy, as well as by its contribution of specific nutrients (i.e.iron from meat, fibers from fruit etc.).Descriptive statistics and frequency tables were used to check for errors, inaccuracies and missing data, which were then discussed with the PIs and updated when possible.
A complete dataset including individual level dietary, socio-demographic and lifestyle information was generated for each participating study.Computing of variables, building the working files and data analyses were performed using SAS version 9.4.

Statistical analysis
The current study presents descriptive statistics of sociodemographic and anthropometric characteristics of each study, as well as nutritional data.
Meat consumption variables were studied by quartiles of total meat, red meat, processed meat and poultry, and median values of each quartile are presented according to study (Table 3).In addition, meat consumption, nutrient intake and dietary intake variables were studied as continuous variables.Each study provides a mean consumption with a different precision, where the precision depends partly on the sample size and partly on the variance of the dietary intakes as reported in that study.Therefore, in order to characterize meat consumption of the pooled study population, weighting method, ensuring that the estimates with higher precision receive higher weight was applied as follows: in each study, the mean and standard error for each meat consumption variable, a weighted mean was calculated according to the following formula w = 1 se 2 (Table 4) and its standard error as 1 divided by the square root of the sum of weights ( 1 √ w ).
Weighted means were calculated for men and women separately, and by type of study questionnaire (FFQ or 24HR recall).A weighted mean was also calculated for each quartile of meat consumption, as described above.A comparison of selected nutrients intake by total meat quartiles was done via a test for linear trend across quartiles (Table 5).

Results
Table 1.includes a summary of the different study participants' characteristics.Five of the studies included both men and women, one included men only and one women only.Both Jewish and Arab participants were included in three of the studies.
Recruitment to the different studies spanned from 1963 until 2014 (Fig. 1), and the nutritional questionnaires were FFQ in five of the studies and 24HR recalls in two.Four different food composition tables were used for the nutrient analysis, three studies used a variation of The Mccance and Widdowson's Composition of Foods [21], two studies had costume built their databases based on several sources and two others were based on the Israeli ministry of health food composition tables [20].These tables provided the relevant food composition available from the actual time period of the specific study.Demographic characteristics of the population by study are displayed in Table 2.The mean age at study start ranged from 49.3 ± 6.3 years in IIHD to 74.6 ± 6.2 in Mabat Zahav.Rates of current smoking and of less than 12 years education were highest in IIHD.Rates of obesity (BMI ≥ 30 kg/M 2 ) were lowest in the IIHD and GOH studies and highest in the HDS and Mabat Zahav.
Meat consumption was categorized according to quartiles.The median value of each quartile of total meat intake and intakes of poultry, red and processed meat (gr/day) by study and type of nutritional assessment, are presented in Table 3.In studies using FFQ, 97% to 100% of participants reported consuming any meat; intakes of total meat in the fourth (highest) quartile ranged from 117 gr/day in the Ovary study and went up to 246 gr/day in the GOH study.In studies using 24HR recalls, total meat intakes ranged from 0 to 191gr/day in Q1 (lowest) and Q4, respectively.Of the meat sub-categories, intakes of poultry were highest across all studies with median intakes in Q4 ranging from 90 to 186gr/day compared to 32 to 82 gr/day of red meat and 34 to 82 gr/day of processed meat.In the studies using 24HR recall, the percent of participants reporting consuming poultry was highest and Q4 intake of poultry was 146 and 172 gr/day in Mabat Zahav and the NNS studies, respectively.
Table 4 presents intakes of meat categories (gr/day) by nutritional assessment type and sex.In general, men reported consuming more meat than women did.This corroborated with a greater mean total energy intake found in men then in women, for both types of nutritional assessment.In studies using FFQ, women reported consuming more processed poultry and organ meats than men.
The BMI and nutrient intake by quartile of total meat consumption are shown in Table 5.In general, absolute nutrients consumption were greater with greater intake of total meat, indicating a generally higher food intake.Percent energy from carbohydrates, and intakes per 1000kcal of dietary fiber, iron, calcium and folic acid were negatively associated with increasing quartiles of meat intake (p linear trend < 0.0001 for all) while median BMI was greater with greater meat intake (from 26.7 to 27.2, p linear trend < 0.0001).

Discussion
We report here on the dietary intake of more than 20,000 participants of seven studies, using different nutritional questionnaires and food composition tables and spanning over five decades.This methodological heterogeneity presented a great challenge to the nutritional data harmonization, especially since the main exposure of interest was on the food level (red and processed meat).Several strategies were employed to deal with this challenge, including revision of food level data, standardizing of dietary and sociodemographic variables and the creation of a uniform system of food grouping that was applied to all studies.

Nutritional data harmonization
This methodological paper is part of a larger project aimed at studying the association of red and processed meat with gastrointestinal cancers and for this reason was focused on meat intake.We found that over 95% of participants reported consuming any meat in studies using FFQ, and that consumption of poultry was the highest of all meat sub-categories in all studies.Total meat intake was similar across studies using FFQ and 24HR recall questionnaires, while in the sub-categories of meat more differences were observed.In the processed meat category higher intakes were found in studies using FFQ, probably due to the different classification of specific meat items owing to the lack of full information on mode and place of preparation in the FFQ.
Other studies applying harmonization to previously collected data faced similar challenges and similarly aggregated food items into food groups to create a uniform system [22,23].Most previous studies merged dietary data of FFQs only [22][23][24], Olsen et al. harmonized data from two large birth cohorts that used similar questions and software for calculation of food intake, demonstrating the advantage of pooling data from studies with comparable nutritional questionnaires.Similarly to the current study, the EURALIM collaboration attempted to harmonize dietary data of six surveys, of them one used a 24HR recall, one used repeated 1-day diet records and the remaining four used FFQs, the authors concluded that the methodological heterogeneity was too great to directly compare dietary measures [25].Following the harmonization of data in the current study, we observed similarities between the two types of nutritional questionnaires in mean intakes of total meat, red meat and poultry, while differences were apparent in sub-groups   (i.e., processed meat).These differences were considered and lead to a separate analysis according to nutritional tool type.

Dietary intake
In the current study, red meat intake was 22.9gr/day among women and 38.6 gr/day among men in studies using FFQ, while in studies using 24HR recall it was 18.0 and 33.0gr/day in women and men respectively.This higher meat consumption in men aligns with the higher total energy intake in men than in women seen both in our study and as reported previously [26].As for the slightly higher consumption of processed meat, mainly due to processed poultry consumption and organ meats, of women vs. men, according to the FFQ, this may be explained by reporting differences.A systematic review by Lee (2016) highlight the impact of gender differences in FFQ, with greater inaccuracy in dietary intake assessment in women [27].In addition, pork was consumed by very few (data not shown) and alcohol intake was low.Thus, this study confirms findings from previous reports and studies on the dietary habits of the Israeli population, which include avoidance of pork meat and of alcohol.These habits are related mainly to religious observance laws of the Jewish and Muslim population, and reflect the older age groups of participants in the studies.According to the OECD's meat consumption indicator, comparing food purchasing per capita, Israel is leading in purchases of poultry, while pork purchases are among the lowest worldwide [28].While in Israel pork consumption reached only 1.2 kg/capita in 2019, in USA the figure was 23.8 kg/capita.For poultry, Israelis consume 68.7 kg/capita compared to 50.9 kg/capita in USA.Beef annual consumption was close between Israel and USA, with 24.1 and 26.0 kg/capita, respectively.In studies of individual dietary data, average red meat intake in the U.S., reported by the NHANES III, was 69.8gr daily [29].In European countries intakes ranged from 71 gr/day in the UK, up to 97.6 gr/day in Sweden [30,31].A recent Israeli case-control study reported red meat intake of approximately 23 gr daily (1.3 portions/week) by Jewish participants and 53 gr daily (3 portions/week) by Arab participants [32].The relatively low meat intake and other unique dietary habits highlight the need to explore the relationship between dietary intake and health outcomes in ethnically and culturally diverse populations.We further compared nutrient intake of participants by quartiles of total meat consumption among studies using FFQ; we found that in absolute terms those who consumed more meat ate more of all nutrients.When examining nutrient intake in relative terms (as percent of total energy or per 1000kcal) we found that those who consumed more meat ate relatively more protein and fat and less fiber, calcium, iron and folic acid.Our finding of a relatively lower consumption of iron among those in the higher quartiles of meat intake is surprising as meat Table 4 Intake (gram per day) for each meat category according to questionnaire type and sex a The weighted mean was calculated by weighting the mean of each separate study according to its standard error, the weight being 1  is a key source of this nutrient.However, in our population, poultry, which is relatively low in iron [33], was the main type of meat consumed.These differences in nutrient intake indicate that consuming more meat may be linked to other food choices and demonstrate how the general dietary pattern may differ between low and high meat consumers.An important advantage of the current collaboration is the high number of participants, with diverse sociodemographic backgrounds that allowed us to explore a wide range of diets.The use of previously collected data enabled us to efficiently study long term effects of the diet, in a relatively short time and low costs.In addition, inclusion of 7 studies, each conducted in a different decade, allowed representation of the changing dietary habits and the composition of foods, which occurred over a long period of time.
The current study has several limitations, mainly the heterogeneity of the nutritional questionnaires used by the different studies (q-FFQ and 24HR recall) that may lead to discrepancies in exposure assessment.In Israel, meat is not consumed daily by most people, poultry being more commonly consumed (as seen in Table 3, %consumers).The 24HR instrument fails to capture intake of episodically foods (i.e., those that are not consumed every day by most people), while the FFQ has the strength of querying about long-term intake, thereby aiming to obtain data on usual intake with a single administration.This explains the differences in the proportion of consumers of meat between the two nutritional instruments.Nevertheless, an analysis performed according to questionnaire type revealed similarities in patterns of food consumption by sex in the two types of questionnaires used in our study (as seen in Table 4).The collection of nutritional data only once throughout the participant's life, which provides a snapshot of the person's dietary intake [34] is yet another limitation of the present study.

Conclusions
The methodology described in this paper may be applicable in other settings and we demonstrate the feasibility of addressing challenges in harmonization of nutritional databases.Using food level dietary exposure data from Table 5 BMI, energy intake, and selected nutrients intake by quartiles of total meat consumption in pooled FFQ data ** p-value of a test for linear trend (see appendix #2) across the quartiles < 0.001 for all characteristics a differences in number of participants with available data are due to missing information on specific nutrients in some of the studies b For each study separately, the mean and standard error of intake were calculated in each quartile of meat consumption.Then, for each quartile, a weighted mean and standard error were calculated as in

se 2 b
The standard error was calculated as 1 divided by the square root of the sum of weights c Beef/pork/lamb d Chicken/Turkey/Duck Weight (gram per day] FFQ Weighted Mean a ± SE b

Table 1
Description of the Participating Study' Characteristics and their Nutritional Questionnaires Abbreviations: IIHD Israel Ischemic Heart Study, GOH Glucose Intolerance, Obesity, and Hypertension study, NNS the Negev Nutritional Study, NICCCS Northern Israel Cancer Case Control Studies, HDS Hadera District Study, sq-FFQ Semi-quantitative food frequency questionnaire.q-FFQ quantitative food frequency questionnaire

Table 2
Baseline demographic characteristics of the study population by study Europe/America, Africa/Asia and Israel refer to Jewish participants only.Table abbreviations, IIHD Israel Ischemic Heart Study, GOH Glucose Intolerance, Obesity, and Hypertension study; NNS the Negev Nutritional Study, NICCCS Northern Israel Cancer Case Control Studies, HDS Hadera District Study *

Table 3
Meat consumption (median for quartile) by study Abbreviations: FFQ Food Frequency Questionnaire, IIHD Israel Ischemic Heart Study, GOH Glucose Intolerance, Obesity, and Hypertension study, NICCCS Northern Israel Cancer Case Control studies, HDS Hadera District Study, NNS the Negev Nutritional Study

Table 4
different studies, offers a unique opportunity to examine links between a wide range of dietary exposures and the outcome of interest.Studying populations with a unique food culture is important, as it allows looking into otherwise rare exposures, and as it enables tailoring of dietary recommendations that are best suited to the population of interest.